Artificial Intelligence in the Life Sciences — Latest Matching Preprints

1

ReCo: a self-configuring and self-extending agentic framework for biomedical research

Tzanis, E.; Klontzas, M. E.

2026-07-16 health informatics 10.64898/2026.07.14.26358025 medRxiv

Top 0.7%

0.4%

Show abstract

This study presents ReCo (Research Cosmos), a self-configuring and self-extending agentic research framework for the biomedical domain. ReCo is orchestrated by a large language model that interacts with native computing tools, bundled Model Context Protocol (MCP) servers, structured skills, persistent project memory, and a desktop interface. Its bundled MCP servers provide biomedical analysis capabilities while serving as implementation paradigms for integrating new computational and AI frameworks. Structured skills encode procedures for environment configuration and framework ingestion, enabling ReCo to inspect repositories, manuscripts, or local codebases; identify dependencies and execution patterns; create isolated runtime environments; design and implement MCP interfaces. Self-extension was evaluated using five heterogeneous systems: the Merlin computed tomography foundation model, MAISI-v2 medical image synthesis framework, asari liquid chromatography-mass spectrometry workflow, DosimeTron agentic radiation-dosimetry platform, and Orthanc DICOM server. ReCo successfully operationalized all five systems and completed predefined functional evaluations. Re-hosted DosimeTron outputs demonstrated near-perfect agreement with the reference pipeline across 651 organ observations (Pearson correlation and Lin concordance correlation coefficient, 0.99999; mean absolute percentage difference, 0.37%). Notably, ReCo configured Orthanc as a PACS-like coordination layer, integrated it with DosimeTron, Merlin, and TotalSegmentator, and orchestrated data retrieval, analysis, and return of valid DICOM RTSTRUCT, RTDOSE, and Structured Report. ReCo provides a unified environment for configuring, documenting, and operationalizing heterogeneous biomedical frameworks, reducing technical barriers to the adoption and integration of emerging computational and AI methods. The official open-source ReCo GitHub repository is available at: https://github.com/eltzanis/ReCo

2

LocusBlend: Flexible multi-index regional visualization of genomic association signals

yang, c.; Cook, N.; Zeng, Y.; Fu, T.; budde, J.; Cruchaga, C.; Belloy, M. E.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.15.26358129 medRxiv

Top 0.7%

0.4%

Show abstract

Summary It has become standard practice to visualize regional signals from genomewide association studies GWAS using LocusZoom plots Similarly GWAS signals are compared to regionally matched quantitative trait loci QTLs ie varianttogene regulation data using LocusCompare plots to aid assessment of candidate traitrelated genes Despite broad usage these tools annotate variants by linkage disequilibrium LD to a single lead or index variant This singleindex representation has limitations for visualizing complex loci that contain multiple independent signals We present LocusBlend an interactive web application for multiindex LDblended visualization of genomic loci LocusBlend supports one or two genomic association summarystatistic datasets and one to three index variants multiindex LocusZoom colorblended plots and matching LocusCompare visualizations Applications to Alzheimers disease GWAS and QTL signals illustrate LocusBlend enables visualization and separation of independent signals despite shared LD and high genomic complexity Overall LocusBlend is aimed at supporting researchers handle the continuously expanding complexity of human genomics findings Availability and Implementation LocusBlend is freely available at httpslocusblendwustledu Publication ready plots are generated in 1min Source code documentation example datasets input templates and reproducibility instructions are available at httpsgithubcomBelloyLabLocusBlend LocusBlend is implemented in Python using Streamlit Plotly and PLINK Supplementary Information Supplementary data are available online

3

A ReAct Agentic AI System for Natural Language Querying and Statistical Analysis of The Cancer Genome Atlas Clinical Data

Korutla, R.; Amal, S.

2026-07-17 health informatics 10.64898/2026.07.15.26358188 medRxiv

Top 0.9%

0.3%

Show abstract

The Cancer Genome Atlas (TCGA) holds clinical data for over 11,000 patients across 33 cancer types, but access is hard because of complex file structures, heterogeneous formats, and the need for programming. We present an agentic system for natural language querying and statistical analysis of TCGA clinical data. The system uses a large language model as an autonomous ReAct agent that selects from eight computational tools, including data extraction, descriptive statistics, Kaplan-Meier survival analysis with log-rank tests, hypothesis testing, and verification against the curated TCGA Pan-Cancer Clinical Data Resource (CDR). The agent reasons about intermediate results, adapts its approach, and returns clinically contextualized responses with source attribution and auditable traces. We introduce TCGA-Agent-Bench, 440 queries across five difficulty tiers with ground truth from the independently curated TCGA-CDR, evaluated with dual metrics of numerical accuracy and clinical completeness. The system achieves 93.4% overall accuracy (100% single-patient lookups, 99.1% cohort statistics, 92.8% comparative analyses), outperforming a fixed rule-based pipeline (87.1%), a single-pass LLM (81.8%), and retrieval-augmented generation (66.9% on a subset). Most of the benchmark is answerable from the CDR alone, so we locate the extraction layer's value in fields the CDR lacks (drug treatments, TNM components, biomarkers, biospecimen metadata): on 26 queries targeting these, the full system answers 100% versus 3.8% for CDR-only. Ablations show the reasoning loop is most impactful (+9.1% accuracy, +22.0 completeness points). A tool-based agentic architecture enables accurate, auditable analysis of clinical repositories, with value driven by tool design and recovered fields rather than model scale.

4

FootNet: A Multi-View Smartphone Dataset and Four-Model Benchmark for Clinical Foot Segmentation

Vijay, A.; Prabhune, A.; Srihari, V. R.; Rayampalli, A.

2026-07-17 health informatics 10.64898/2026.07.15.26358117 medRxiv

Top 1%

0.2%

Show abstract

We present FootNet, a 453-image multi-view smartphone foot dataset for binary foot segmentation, with expertannotated masks across six anatomical views (dorsal, medial, and plantar, both left and right). We benchmark four segmentation models under a controlled protocol: U-Net with a MobileNetV2 encoder achieves the best performance (IoU 0.9268, Dice 0.9608, 95 % CI [0.9209, 0.9320]); DeepLabV3 with MobileNetV3-Large scores IoU 0.8984 (Dice 0.9449); UNet++ with MobileNetV2 scores IoU 0.8913 (Dice 0.9391); and SAM ViT-B with oracle boundingbox prompt scores IoU 0.9219 on the matched 191-image subset. Bonferroni-corrected Wilcoxon signed-rank tests (k = 6 comparisons) show U-Net significantly outperforms DeepLab (p < 0.001, r = 0.638) and SAM ViT-B with oracle boundingbox (p = 0.005, r = 0.202); UNet++ does not significantly differ from DeepLab (p = 0.062). Connected-component postprocessing yields negligible benefit (mean {triangleup}IoU = +0.0003, 12 of 453 images improved). The extended dataset is available upon request

5

Encoding Discordance in the Alzheimer's Disease A/T/N Framework

DeLong, L. N.; Salimi, Y.; Balabin, H.; Galdi, P.; Fleuriot, J. D.; Brennan, P. M.; Alzheimer's Disease Neuroimaging Initiative,

2026-07-21 health informatics 10.64898/2026.07.19.26358425 medRxiv

Top 2%

0.2%

Show abstract

INTRODUCTION: The biomarker-based amyloid/ tau/ neurodegeneration (A/T/N) framework has become a popular staging method for Alzheimer's disease (AD) research. Previous studies use the framework either as a rule-based or data-driven approach but typically sacrifice either adaptivity or interpretability. METHODS: We present an interpretable, hybrid method, called Neurosymodal Data Fusion, for predicting incident AD in the ADNI dataset. Specifically, we encode the A/T/N framework as a logic program, where the input biomarker features are extracted by one or more neural networks. RESULTS: Our pipeline predicted four-year incident AD with a sensitivity of up to 0.84. Additionally, our models learned scores for each A/T/N profile, denoting relative importances to model predictions. These scores also indicated that empirically-derived cut-off values for the A and T criteria might be uninformative for the ADNI data. DISCUSSION: Our pipeline provides a novel way to use the A/T/N framework that could potentially improve early AD screening years before clinical manifestations.

6

In Silico Trial Simulation with Artificial Intelligence-Generated Synthetic Control Cohorts Reproduces Results of a Randomized Controlled Trial in Acute Myeloid Leukemia

Kumar Reddy, K.; Hahn, W.; Winter, S.; Roellig, C.; Mueller-Tidow, C.; Serve, H.; Baldus, C. D.; Fransecky, L.; Schliemann, C.; Burchert, A.; Schaefer-Eckart, K.; Kaufmann, M.; Schetelig, J.; Bornhaeuser, M.; Middeke, J. M.; Eckardt, J.-N.

2026-07-16 health informatics 10.64898/2026.07.15.26358123 medRxiv

Top 2%

0.1%

Show abstract

Rising costs, slow accrual and molecular substratification of cancers necessitate novel clinical trial designs. We demonstrate that artificial intelligence-generated synthetic patients can replace real controls to reproduce results of the SORAML trial. Using external multimodal data from 1,377 acute myeloid leukemia (AML) patients from previous trials and a real-world registry, we fine-tuned a tabular foundation model to generate synthetic patients, reproducing clinical and genetic features and outcome associations. Synthetic patients were then matched to the original SORAML intervention group using Cox risk scores, replacing the original control and reproducing the original trial result with near-identical median event-free survival (EFS) and treatment effect (original hazard ratio [HR] 0.64, 95%-confidence interval [CI] 0.47-0.87, p=0.004; with synthetic control HR 0.66, 95%-CI 0.48-0.90, p=0.009). Our findings demonstrate that AI-generated synthetic patients can serve as statistically rigorous controls supporting novel trial designs.

7

CuGen: A GPU-accelerated framework for large-scale genomics

Kiiskinen, T.; Richland, J.; Wang, W.; Lu, W. S.; Balasubramanian, N.; Hastie, T.; Tibshirani, R.; Rivas, M. A.

2026-07-17 genetic and genomic medicine 10.64898/2026.07.15.26358178 medRxiv

Top 2%

0.1%

Show abstract

Biobank-scale genomic analyses remain computationally expensive, CPU-bound workflows, particularly when adjusting for confounding. Here, we present CuGen, a GPU-accelerated framework for large-scale genomics. CuGen uses UltraLasso, a novel hierarchical application of univariate-guided sparse regression (uniLasso), to select a compact, phenotype-informed active set of fewer than 30,000 variants. This achieves robust leave-one-chromosome-out (LOCO) confounding control, enabling both downstream GWAS and in-sample fine-mapping. Additionally, we introduce the .cugen file format, a genotype representation designed for memory-optimized, high-throughput streaming and random access on GPU hardware. Building on this substrate, we provide a general GPU-accelerated genomics toolkit handling polygenic prediction, data manipulation, quality control, analysis, and visualization. We demonstrate CuGen's efficacy in the UK Biobank with up to 408,624 individuals, where the full GWAS pipeline and fine-mapping against 6.8 million imputed variants completes in approximately 10 minutes on a single high-throughput GPU with 80 GB of memory. The pipeline scales efficiently to massive phenome-wide analyses with sublinear resource consumption.

8

MedZone Embedder: a framework for representation learning of Japanese secondary medical care areas from a national ICU registry, characterizing intensive care provision structure and regional vulnerability

Ohno, K.; Hashimoto, S.

2026-07-20 health informatics 10.64898/2026.07.17.26358373 medRxiv

Top 2%

0.1%

Show abstract

Background: In Japan, acute inpatient care is divided into approximately 335 secondary medical care areas, which serve as the basic units for planning healthcare delivery systems under the 8th National Health Care Plan. While comparisons between regions and facilities typically rely on a single risk-adjusted metric, this approach confuses differences in patient demographics with differences in the actual infrastructure of intensive care units (ICUs). This paper presents a framework - MedZone Embedder - for deriving data-driven indicators of regional structural vulnerability by mapping secondary medical care areas onto a learned similarity space, together with its working implementation. The paper sets out the concept, the method, a proof of concept, and an explicit staged validation program, rather than national empirical results. Methods: Each area is represented by a feature vector consisting of aggregated values of intensive care provision indicators derived directly from the Japan Intensive Care Patient Database (JIPAD) - specifically, risk-adjusted mortality rates (standardized mortality ratios and an in-hospital composite indicator), technical efficiency, length of stay, readmission rates, case severity, and case composition - with the within-area variance of these indicators also taken into account. No hierarchical processing by facility type is performed. A contrastive autoencoder (multilayer perceptron encoder 32 -> 16 -> 8, symmetric decoder) is trained by self-supervised learning, using an objective function that combines reconstruction and normalized temperature cross-entropy (NT-Xent) on noise-augmented views. The resulting 8-dimensional embedding supports area searches based on cosine similarity and anomaly scoring in the embedding space (using isolation forest, Mahalanobis distance, or k-nearest-neighbor density), which is normalized to a vulnerability score ranging from 0 to 1. If deep learning libraries are unavailable, or if the number of areas is small, an alternative method using deterministic principal component analysis is employed. Results: This method was implemented and deployed within an operational ICU decision support system on a managed cloud platform. The proof of concept (PoC) is structured around five secondary medical care areas within Kyoto Prefecture and runs entirely on synthetic facility-level aggregate data constructed to follow the JIPAD indicator schema; no registry data were accessed. It generated: an aggregate provision profile for each area; an area embedding space equipped with a similar-area search function; and a vulnerability ranking that identifies areas with low patient numbers and low diversity that exhibit overall poor outcomes. At this scale, the contrastive autoencoder falls back to principal component projection. The deep learning pathway has been implemented and unit testing has been completed; training and evaluation on actual registry data are pending data-use approval and the expansion of data integration. Validation is staged: Stage 2 will train the contrastive pathway over JIPAD-covered areas to assess construct validity against public structural indicators (ICU/HCU beds, population, accessibility), and Stage 3 will extend coverage to all areas via National Database (NDB) linkage. Conclusion: MedZone Embedder reframes regional comparison from single-indicator ranking to structural representation: which areas are alike, and which are structural outliers. The contribution of this paper is the framework - the proposal that the intensive care provision structure of Japanese secondary medical care areas can be learned from a national outcomes registry and read through the lens of what we call institutional debt - together with a deployed implementation and a pre-specified validation program. To our knowledge, this is a candidate first application of contrastive representation learning to Japanese secondary medical care areas.

9

Critically Ill Children Frequently Receive Medications with Established but Unused Pharmacogenomic Guidelines: Actionable Findings from an Integrated Electronic Medical Record and Exome Sequencing Study

Lynch, N.; Elefant, N.; Revah-Politi, A.; Geneslaw, A. S.; Beckett, J.; Wall, J. B.; Aguilar Breton, C.; Sabatello, M.; Kernie, S. G.; Bayir, H.; Gharavi, A. G.; Motelow, J. E.

2026-07-20 genetic and genomic medicine 10.64898/2026.07.16.26358240 medRxiv

Top 3%

0.1%

Show abstract

Importance Pharmacogenomic (PGx) guidelines can improve medication efficacy and reduce toxicity, but their application in pediatric intensive care units (PICUs) remains largely unexplored. Objective To determine the frequency of medications with established PGx guidelines administered in the PICU and assess the capacity of exome sequencing to capture PGx phenotypes for these medications. Design Retrospective cohort study integrating electronic medical record and exome sequencing data. Setting Morgan Stanley Children's Hospital of NewYork-Presbyterian, a single center tertiary care children's hospital. Participants A total of 4,939 children admitted to the PICU (2020 - 2024), and 192 children admitted to the PICU who underwent exome sequencing for research purposes (2015 - 2023). Exposure Critical illness requiring PICU admission. Main Outcomes and Measures Frequencies of administration of medications with established PGx guidelines in the PICU and the proportion of individuals with exome sequencing with identifiable PGx phenotypes. Results Among 4,939 PICU patients, 37.2% (n=1,837) received at least one medication with established PGx guidelines and 14.4% (n=712) received two or more such medications. Twenty PGx genes were implicated; CYP2C9 was most common (17.3%, n=853). An estimated 8.2% of patients received medications for which PGx-guided recommendations would have altered clinical management. Among 192 patients who underwent exome sequencing, at least one metabolizer phenotype was identified in 62% (n=119). Conclusions and Relevance Many critically ill children receive medications with established PGx guidelines. This study highlights an opportunity for more personalized medicine for critically ill children admitted to a tertiary care hospital and assesses the strengths and weaknesses of exome sequencing to uncover pertinent PGx phenotypes.

10

Genetic Counselor Utilization Across Non-Genetics Departments for Neurodevelopmental Disorders

Cole, J. J.; Cohen, J. S.; Sahin, M.; Srivastava, S.; Campbell, C. A.

2026-07-21 genetic and genomic medicine 10.64898/2026.07.20.26358492 medRxiv

Top 3%

0.1%

Show abstract

IMPORTANCE: Most United States children with neurodevelopmental disorders have not received genetic testing aligned with current guidelines. Integration of genetic counselors into non-genetics departments is a potential strategy to improve uptake, but prevalence and details of integrated care models are unknown. OBJECTIVE: To characterize availability, utilization, and perceived need for genetic counselors across non-genetics departments caring for patients with neurodevelopmental disorders DESIGN: Cross-sectional observational department-level survey SETTING: Child neurology, adult neurology, developmental pediatrics, child psychiatry, and adult psychiatry departments at Intellectual and Developmental Disabilities Research Centers PARTICIPANTS: The survey was distributed to 67 departments across 15 institutions. The departmental response rate was 52% (35/67), with at least one response from 87% (13/15) of institutions. EXPOSURE: Presence/absence of dedicated genetic counselor(s), where "dedicated" was defined as hired by the department MAIN OUTCOME(S) AND MEASURE(S): This was a descriptive study only, with no comparative statistical analyses due to the exploratory nature. RESULTS: One third of departments (34%; 12/35) reported having dedicated clinical genetic counselors. Prevalence was highest in child neurology (67%; 8/12), followed by adult neurology (40%; 2/5) and developmental pediatrics (22%; 2/9), with none in child psychiatry (0/7) or adult psychiatry (0/2). In almost all departments with genetic counselors (92%; 11/12), they directly billed for their services, which universally included pre-test counseling/consent and post-test counseling. In departments without genetic counselors, only 39% (9/23) reported providers ordered their own genetic testing. Among all departments, over half (57%) were interested in adding/increasing genetic counseling support, while 26% were unsure and 17% uninterested. Insufficient funding was the most cited barrier; only one department reported insufficient need. CONCLUSIONS AND RELEVANCE: Though currently implemented in only one third of departments, our findings suggest those with dedicated genetic counselors directly pursue genetic testing (without referring to genetics) more than those without genetic counselors. Interest in increasing or adding genetic counseling support was high, and though funding was a reported barrier, feasible funding models were described. In the context of limited medical geneticists and expanding precision therapies, alternate delivery models for neurodevelopmental genetic testing including genetic counselor integration in non-genetics departments may help to scale and sustain uptake.

11

Gradient-guided adapter merging for neuroimaging vision-language models

Bit, S.; Guney, O. B.; Jia, S.; Kolachalama, V. B.

2026-07-21 health informatics 10.64898/2026.07.18.26358397 medRxiv

Top 3%

0.1%

Show abstract

Automated interpretation of neuroimaging studies requires simultaneous assessment of multiple imaging evidence variables, each tied to distinct anatomical structures. Vision-language models (VLMs) offer a unified framework for multi-task analysis, but adapting pre-trained VLMs remains challenging. Full fine-tuning is computationally prohibitive, and joint multi-task training requires simultaneous access to all task data, which is often infeasible in clinical settings. Although model merging enables multi-task composition without joint re-training, existing methods focus on post-hoc algorithms with limited extension to VLMs and minimal application to neuroimaging. Here, we present GRadient-guided Adapter Merging (GRAM), a layer-selective low-rank adaptation (LoRA)-based fine-tuning and merging framework for multi-task neuroimaging visual question-answering (VQA). GRAM uses a gradient ratio that contrasts class-specific gradients to identify task-discriminative layers, and applies subspace-constrained projected gradient descent to restrict LoRA updates to directions consistent with the geometry of the pre-trained model. We leveraged a structured VQA benchmark, developed from the National Alzheimer's Coordinating Center (NACC) dataset, that pairs multi-sequence brain MRI studies with question-answer pairs across clinically relevant imaging evidence variables. Experiments on the VQA benchmark showed that GRAM outperformed or matched all-layer LoRA fine-tuning and a standard merging baseline while reducing inter-task interference during merging, and approached or surpassed the performance of joint multi-task training without joint re-training.

12

Exploration of the molecular origins of sex-specific and temporal comorbidity patterns in dementia: insights from the Austrian claims data

Kovacevic, V.; Basaragin, B.; Kovacevic, J.; Zecevic, A.; Danilo Lombardo, S.; Dervic, E.

2026-07-16 genetic and genomic medicine 10.64898/2026.07.14.26357961 medRxiv

Top 4%

0.1%

Show abstract

Dementia is a progressive condition that impairs cognitive processes such as memory, decision making, and the ability to manage daily activities. Recent estimates suggest that more than half of all dementia cases could be preventable by addressing their risk factors, including disease comorbidities such as diabetes and vision loss. Yet, we lack a comprehensive molecular map of dementia comorbidities. In this work, we analyzed Austrian nationwide hospital claims data, comprising 13 million hospital stays from 2015 to 2019, to systematically assess dementia-related risk across disease comorbidity patterns, covering both their molecular relationships and their epidemiological overrepresentation. We identified disease trajectories occurring before and at the time of dementia diagnosis, revealing both sex-specific and shared comorbidity patterns. Overall, we identified 51 potential risk factors, with a prominent contribution from endocrine and metabolic disorders. While Parkinson's disease emerged as a strong molecularly related driver of dementia, we also identified emerging and previously under chracterized risk factors, including vitamin D deficiency. This integrative framework provides a comprehensive view of dementia associated disease networks and identifies novel, potentially modifiable risk factors. These results offer new opportunities for targeted prevention strategies and advance our understanding of the complex interplay between comorbidities and dementia development.

13

Implementation of a standardized Video-based Asynchronous Neurological Examination (VANE) in a multi-center observational study of Alzheimer's disease (AD) and AD related dementias

Noble, J. M.; Nadkarni, N. K.; Martinez, D.; Temprosa, M.; Bowers, A.; Carmichael, O.; Doherty, L.; Febres, G. J.; Sanchez, D. L.; Goldberg, T. E.; Sherif, H.; Shah, V.; Luchsinger, J. A.; DPP Research Group,

2026-07-17 epidemiology 10.64898/2026.07.15.26357456 medRxiv

Top 4%

0.0%

Show abstract

Introduction: The Diabetes Prevention Program Outcomes Study (DPPOS) is an established cohort of aging persons with pre-diabetes and type 2 diabetes with 25 years of median follow-up. In 2022 DPPOS added Alzheimer's disease (AD), and AD related dementias (ADRD) phenotyping using the National Alzheimer's Coordinating Center (NACC) Uniform Data Set (UDSv3), which included a standardized neurological examination across 25 clinical sites, administered by clinical staff and interpreted centrally by clinicians. Methods: A DPPOS video-based asynchronous neurological examination (DPPOS-VANE) was developed iteratively through consensus from research clinicians and staff feedback to harmonize with UDSv3 to identify common neurological diagnoses aside from dementia including diabetic cranial neuropathies, stroke and parkinsonism. DPPOS-VANE was designed to be conducted without direct participant contact by the examiner, reproducible, and independent of clinical skills of PCs. An iPad camera recorded the video exam, comprised of assessments of extraocular and facial movements, visual fields, speech, gross motor strength, pronator drift, praxis and parkinsonism. A 10-minute training video demonstrated the examination step-by-step with scripts and instructions in English and Spanish. Site-specific performance review, feedback, and staff certification preceded central reading of video recordings by physicians. After two years of implementation, 1286 DPPOS-VANEs led to 1284 examination reviews. Of these, 1204 (93%) were completed by having the examiner follow the standard script. Overall, 1237 examinations (96%) were delivered as planned, 41 (3%) had minor errors but were still usable, and 6 (0.4%) had major deviations in exam technique; two additional recorded evaluations were not usable as recorded videos were inaccessible due to technical errors. Each examination was completed within 10-15 minutes. Each site on average completed 51.4 examinations (range 14-92). Discussion: Engaging 55 research staff across 25 sites and 3 physician-reviewers, this study is the first to demonstrate feasibility of a VANE as an efficient neurological examination model enabled by commonly used devices. Such a multisite standardized VANE represents a novel paradigm for large epidemiological studies.

14

Escalate or Switch? Treating the Post-Titration GLP-1 Non-Responder: A Target Trial Emulation With Dose-Equivalence Reclassification

Erly, B.; Raja, S.

2026-07-16 epidemiology 10.64898/2026.07.14.26357491 medRxiv

Top 4%

0.0%

Show abstract

Background. When a GLP-1 patient stops responding, should the clinician push the dose or change the drug? Observational answers conflate two distinct sources of confounding. Most early-week "escalations" in real-world data are FDA-mandated titration steps rather than deliberate clinical decisions, and patients who deviate do so for reasons we cannot observe. Semaglutide and tirzepatide are also not equivalent milligram-for-milligram, so naive class-switch comparisons mix mechanism and dose. We resolve both by restricting to post-titration patients and reclassifying treatments under the Whitley 2023 dose-equivalence framework. Methods. From 68,969 telehealth GLP-1 patients we built a post-titration cohort. Each patient's index time is the day they completed at least four weeks at therapeutic dose (Whitley tier 3 or higher: semaglutide 1.0 mg or tirzepatide 5 mg). Confirmed slow response is less than 5% total weight loss at the index, consistent with FDA weight-management drug-development guidance and AACE/ACE criteria. We compared four post-index strategies against continuing the current regimen: within-class dose escalation, equipotent class switch (a Whitley tier change of 1 or fewer), and class switch with potency increase. Direction-specific analyses split switches into semaglutide-to-tirzepatide and tirzepatide-to-semaglutide arms. Outcomes were percent weight loss at 12 and 24 weeks post-index. We estimated effects six ways: propensity-score matching; IPTW with linear and gradient-boosted propensities; the g-formula with linear and gradient-boosted outcome models; and AIPW, the doubly-robust estimator we use as the tiebreaker. Two-layer inverse probability of censoring weighting addressed strategy adherence and outcome ascertainment. We computed E-values, ran a negative-control specification, and stratified by tolerability. Results. The post-titration cohort comprised 24,876 confirmed slow responders. Within-class dose escalation produced a small consistent benefit at 24 weeks: AIPW +0.64 pp (95% CI +0.16 to +1.12), with five non-AIPW estimators ranging +0.47 to +0.76 pp. The continue arm itself lost an additional 8.07 pp over the same window (96% continued to lose), so escalation is a marginal addition to a substantial natural slope, not a rescue. Equipotent class switching from semaglutide to tirzepatide was inconclusive: linear and matching estimators ranged +1.26 to +1.80 pp, but AIPW was -0.33 pp (95% CI -1.27 to +0.60) with only 90 treated patients and limited propensity-score overlap. Class switch with simultaneous potency increase (sema to tirz) gave AIPW +0.65 pp at 12 weeks (95% CI +0.37 to +0.92, n = 80). A negative-control specification yielded ATE -0.13 pp, indicating the pipeline did not generate spurious signal. A held-out-fold prognostic-threshold sensitivity gave a null effect (+0.04 pp), correcting an earlier circular +1.06 pp estimate. Conclusions. Among confirmed post-titration slow responders, within-class dose escalation adds approximately 0.6 percentage points at 24 weeks on top of an 8 percentage point natural slope, consistently across six estimators including doubly-robust inference. This headline effect is small and not robust to modest unmeasured confounding (E-value 1.27) or to MNAR-style outcome attrition (tipping point delta approximately 1.2 pp), so it should be read as hypothesis-generating rather than practice-changing. Class-switching evidence is inconclusive; linear-estimator results suggesting benefit did not survive doubly-robust estimation in small treated samples with limited propensity overlap. The dose-ladder framework, with phase-specific evidence grading, is hypothesis-generating and insufficient on its own to change practice.

15

Multilevel Factors Associated with Nonresponse to Patient-Reported Outcome Measures in Routine Radiation Oncology Care

Liu, J. B.; Chen, Y.-J.; Edelen, M. O.; Pusic, A. L.; Martin, N. E.; Zeng, C.

2026-07-17 health systems and quality improvement 10.64898/2026.07.15.26358162 medRxiv

Top 4%

0.0%

Show abstract

Purpose: Nonresponse to routinely collected patient-reported outcome measures (PROMs) threatens the representativeness of aggregated data. We characterized patient-, provider-, and clinic-level factors associated with PROMIS Global-10 nonresponse in routine radiation oncology care. Methods: In this retrospective cohort study, all adults seen at five Mass General Brigham radiation oncology clinics over one year were included. The primary outcome was patient-level nonresponse, defined as never completing the portal-administered Global-10 versus completing it at least once. Using iterative mixed-effects logistic regression, we modeled patient-, provider-, and clinic-level factors. Results: Among 12,214 patients, 71 providers, and five clinics, patient- and appointment-level response rates were 35.4% and 10.9%, with patient-level response ranging nearly fivefold across clinics (12.8% to 66.2%). In Model 1, male sex, lower education, not working, and recent surgery had higher odds of nonresponse, and longer time since diagnosis lower odds. After provider- and clinic-level factors were added, patient sex, education, and employment became nonsignificant, whereas recent surgery (adjusted odds ratio [aOR] 1.97) and longer time since diagnosis (aOR 0.46 for >12 months) persisted. A provider's historical collection rate was protective but attenuated at the clinic level. There, a later program launch (aOR 0.29) and higher historical collection rate (aOR 0.79) correlated with lower nonresponse, whereas academic versus community setting did not. Conclusions: Nonresponse to routinely collected PROMs is a multilevel phenomenon driven substantially by clinic-level implementation factors, not patient characteristics alone. Because response rate is only a proxy for representativeness, PROMs programs and PRO-based performance measures should prioritize representative collection over volume.

16

Rationale and guidance for implementing the continual reassessment method for dose-finding in controlled human infection model studies

Weerasinghe, C.; Osowicki, J.; Simpson, J. A.; Crocker-Buque, T.; McCarthy, J.; Williams, E.; Price, D. J.

2026-07-17 infectious diseases 10.64898/2026.07.16.26358128 medRxiv

Top 4%

0.0%

Show abstract

Controlled human infection models (CHIMs) are increasingly used in infectious disease research to study pathogen dynamics and evaluate interventions under controlled conditions. However, these studies are resource-intensive and involve ethical and safety constraints, making efficient study design critical. Dose-finding is a key early component in CHIMs, where the aim is to identify a challenge dose that achieves a target infection probability. Traditional rule-based designs are commonly used but can be inefficient, motivating the use of model-based adaptive approaches such as the Bayesian Continual Reassessment Method (CRM). Although CRM has been extensively studied and widely adopted in Phase I oncology trials for identifying the maximum tolerated dose of therapeutics, its application in CHIM settings remains limited, particularly when the endpoint of interest is infection. This tutorial provides step-by-step guidance for implementing a Bayesian CRM in dose-finding CHIMs, using an oropharyngeal Neisseria gonorrhoeae challenge as a motivating case study. The framework outlines key design components, including dose-grid specification, dose-response model, prior elicitation, Bayesian updating, decision rules, and stopping criteria, with particular emphasis on a clinically interpretable parameterisation. Trial operating characteristics are evaluated through simulation studies under multiple dose-response scenarios and prior-predictive analyses, and compared with a commonly used '3+3' type rule-based design. This work highlights the advantages of Bayesian model-based designs for dose-finding in CHIMs over classic rule-based designs and provides a structured, reproducible framework for implementing CRM, supporting their application in future CHIM studies.

17

Comparative Efficacy of Vancomycin and Fidaxomicin Regimens for the Prevention of Recurrent Clostridioides difficile Infection: A Systematic Review and Network Meta-Analysis of Randomized Controlled Trials

Prosty, C.; Butler-Laporte, G.; Brophy, J.; Frenette, C.; Loo, V.; Coburn, B.; Hota, S.; Longtin, Y.; Kong, L.; Muller, M.; Steiner, T.; Valiquette, L.; Daneman, N.; Daley, P.; Nott, C.; MacFadden, D. R.; Kandel, C.; Chen, Y.; Perez- Patrigeon, S.; Lee, T. C.; McDonald, E.

2026-07-17 infectious diseases 10.64898/2026.07.14.26358112 medRxiv

Top 4%

0.0%

Show abstract

Background and Aims The optimal treatment for first episodes and first recurrences of Clostridioides difficile infections (CDI) is unknown and there is emerging evidence for pulse and taper (P-T) regimens. Therefore, we sought to estimate the relative efficacy of treatment options. Methods MEDLINE and CENTRAL were searched from database inception to May 21, 2025 and unpublished conference abstracts were searched from recent infectious disease conferences. RCTs on the treatment of first episodes or first recurrences of CDI comparing fixed-dose or P-T regimens of fidaxomicin or vancomycin were included. The primary and secondary outcomes were 40- and 56-day CDI recurrence, respectively. A random-effects network meta-analysis on the risk ratio (RR) scale was conducted using a standard regimen (10-14 days) of vancomycin as the comparator. Treatments were ranked using the surface under the cumulative ranking curve (SUCRA). Results 8 RCTs were included comprising a total of 2181 patients. For 40-day recurrence, fidaxomicin P-T had the highest probability of ranking best (RR=0.10, 95%Confidence Interval [95%CI]=0.10-0.49, SUCRA=1.00), followed by vancomycin P-T (RR=0.49, 95%CI=0.32-0.76, SUCRA=0.61), fixed-dose fidaxomicin (RR=0.61, 95%CI=0.49-0.76, SUCRA=0.39), and, finally, fixed-dose of vancomycin (SUCRA=0.00). The treatments ranked in the same order for 56-day recurrence, though only 3 RCTs reported on this timepoint. Conclusion Vancomycin P-T, fidaxomicin P-T, and fixed-dose fidaxomicin were all superior to a fixed-dose vancomycin. Head-to-head comparative effectiveness RCTs are needed to quantify their relative effect sizes of and impact on long-term prevention of recurrent CDI.

18

Complex intra-host SARS-CoV-2 evolution following monoclonal antibody pre-exposure prophylaxis

Kamelian, K.; Pascall, D. J.; Cheng, M. T. K.; Meng, B.; Altaf, M.; Morse, R. M.; Aggio, J. B.; Egan, D. J. S.; Chen-Xu, M.; Trivioli, G.; Sutton, B.; Richter, A.; Gonzalez-Vazquez, L. D.; Cormie, C.; Kemp, S.; Yeadon, R.; Hyatt, B.; Wong, A.; Thesin Pelamkulangara, N.; Fraser, E.; McCarthy, B.; Novaes, F.; Stott, S.; Galvin, A.; Bellis, K. L.; De Angelis, D.; Harrison, E. M.; Martin, D.; Smith, R. M.; Gupta, R. K.

2026-07-17 infectious diseases 10.64898/2026.07.14.26356329 medRxiv

Top 4%

0.0%

Show abstract

Background: Monoclonal antibodies have emerged as a prophylactic strategy to prevent symptomatic SARS-CoV-2 infection in immunocompromised individuals. However, the evolutionary and clinical implications of breakthrough infections under this regime remain unclear. Methods: A male in their 80s with a haematological/oncological diagnosis received a 2000 mg intravenous infusion of sotrovimab in March 2023 and was diagnosed with COVID-19 by RT-qPCR from a nasopharyngeal swab in August 2023. Weekly samples (n=24) were collected through February 2024 (171 days). All samples underwent whole-genome sequencing, with select mutations subjected to functional assessment. Findings: Sequencing identified the GE.1 lineage at all timepoints. An intra-host recombination event in ORF1ab (positions 8942-12458) was detected prior to 23 weeks post-detection, followed by a 14-fold increase in viral load (7.42e+06 to 1.00e+08 RNA copies/mL) and a marked shift in the viral population. E340D, a sotrovimab resistance mutation, was detected at low abundance (46%) within the first week post-infection, fluctuated over time, and was nearly fixed by week 15 (107 days) post-detection. We assessed five spike mutations - V36M, S98F, and V213G in the N-terminal domain, Y505P in the receptor-binding domain, and P681Q near the S1/S2 cleavage site - and additionally evaluated the impact of E340D. V36M conferred the highest infectivity across all cell lines, with the most significant effect in low-TMPRSS2 cells. While all mutations showed enhanced infectivity with the addition of E340D, the effect was most pronounced in mutations with lower baseline infectivity. The addition of E340D significantly decreased relative neutralizing titres for V36M, S98F, and V213G, enabling escape from neutralizing antibodies in XBB-responsive individuals, illustrating an enhanced phenotypic advantage. Patient neutralizing activity was absent pre-sotrovimab, and sotrovimab-induced neutralization was further compromised by selection of E340D. Interpretation: Sotrovimab pre-exposure prophylaxis in an immunocompromised patient did not prevent SARS-CoV-2 infection, and selected for resistant mutation E340D, with unexpected fitness consequences across non-receptor binding domain spike regions.

19

Nationwide Mpox Genomic Surveillance Reveals Clade Ib Introductions, APOBEC3-Driven Evolution, and Terminal Deletions

Brochu, H. N.; Shi, Q.; Song, K.; Zhang, Q.; Munroe, J.; Harris, N. J.; Britt, N.; Zeng, Q.; Kapuria, K.; Chappell, J.; Norvell, B. M.; Peavy, L.; Williams, J. D.; Harris, A. B.; Chaitram, J.; Hutson, C. L.; Deng, J.; McGrath, D.; Boles, D.; Dale, S. E.; Gigante, C. M.; Iyer, L. K.

2026-07-17 infectious diseases 10.64898/2026.07.15.26357894 medRxiv

Top 4%

0.0%

Show abstract

Background The 2022-2023 global mpox outbreak highlighted the critical need for robust genomic surveillance capabilities to track mpox virus (MPXV) evolution and transmission dynamics. Methods Building upon our established SARS-CoV-2 sequencing infrastructure, we implemented a Molecular Loop probe-based long-read sequencing approach using Pacific Biosciences Sequel II technology for comprehensive MPXV genomic surveillance across the United States (US). From August 2024 to June 2025, we generated 326 high-quality whole genome sequences from residual mpox-positive clinical specimens collected by Labcorp across all 10 US Department of Health and Human Services regions. Results Our analysis identified two samples containing clade Ib MPXV in January and June 2025 and captured shifting trends in clade IIb diversity, with 13 distinct lineages observed. We also identified multiple instances of large (~1.6-17.6kb) deletions proximal to the inverted terminal repeats in clade IIb genomes. APOBEC3 mutation analysis indicated substantial evidence of human-to-human transmission among both clades. Further, we observed significantly higher APOBEC3-associated SNPs per kilobase (P<0.001) in clade IIb genomic variable regions relative to their central conserved region. Our assay exhibited strong reproducibility across biological replicates from individual patients and accuracy was confirmed via parallel sequencing of select specimens by US Centers for Disease Control and Prevention (CDC) using metagenomic sequencing. We also demonstrated via custom simulation that our assay discriminates all known MPXV clades and lineages, including those we have not observed in the US. Conclusions Our integrated nationwide surveillance system facilitates real-time genomic tracking of outbreak evolution, with demonstrated capacity across SARS-CoV-2 and MPXV, positioning this platform for rapid deployment during future pathogen emergence.

20

Chart review and genetic validation of electronic medical record dementia diagnoses in VA: The impact of CMS data

Logue, M.; Lee, S. O.; Gillis, M.; Zhang, R.; Lee, M.; Marra, D.; Lopez, F. V.; Lynch, J.; Panizzon, M. S.; Tsuang, D. W.; Hauger, R. L.; The MVP Cognitive Decline and Dementia During Aging Working Group, ; Program, V. M. V.; Merritt, V. C.

2026-07-17 health informatics 10.64898/2026.07.14.26358063 medRxiv

Top 4%

0.0%

Show abstract

Background: International Classification of Diseases (ICD) codes are often used in epidemiological studies to track disease rates over time. Objective: This evaluation of ICD-code-based algorithms for electronic medical record (EMR) studies of Alzheimers disease (AD) and related dementias (ADRD) examines the impact of incorporating Centers for Medicare and Medicaid (CMS) data as an additional source of diagnostic and treatment information in Department of Veterans Affairs (VA) EMR studies. Methods: We performed a chart review of 100 VA Million Veteran Program (MVP) participants to evaluate algorithm performance. We also assessed genetic associations across algorithms in a large MVP cohort (n=396k). Results: Adding CMS data increased the number of detected cases, sensitivity, and positive predictive value, but decreased specificity and negative predictive value. Genetic analyses showed that broader (ADRD/dementia) algorithms with just VA data performed similarly to narrow (AD-focused) algorithms incorporating both VA and CMS ICD codes. Additionally, narrow AD algorithms based solely on VA data yielded the highest ORs, indicating the largest proportion of late-onset AD cases. Conclusions: We recommend using a broad (ADRD) algorithm without CMS or medication data, particularly for epidemiological studies or a strict AD algorithm including CMS and medication cases for genetic discovery of late-onset AD associations in VA EMR, and a strict AD algorithm without CMS data for applications focused solely on AD and sensitive to misspecification. Careful evaluation of algorithm performance is warranted in different EMR systems, as ICD coding practices vary by institution, as demonstrated by this comparison of VA EMR and CMS data.